Metadata for Integrating Chinese Text and Speech Documents in a Multi-media Retrieval System
نویسنده
چکیده
Multimedia documents place new requirements on the conventional text retrieval systems. This paper presents a multimedia retrieval system that employs the content-based strategy to retrieve both text and speech documents. Its input can be a sequence of spoken words which are digitized waveforms or a sequence of characters, and its output is a list of ranked text and/or speech documents. In this system, a new metadata especially designed for both text and speech documents is proposed. The metadata is automatically generated with special consideration of the characteristics of Chinese. The presented approach is very easy to implement and the preliminary tests give very encouraging results.
منابع مشابه
A Multimedia Retrieval System for Retrieving Chinese Text and Speech Documents
Multimedia documents place new requirements on the conventional text retrieval systems. This paper presents a multimedia retrieval system that employs the contentbased strategy to retrieve both text and speech documents. Its input can be a sequence of spoken words which are digitized waveforms or a sequence of characters, and its output is a list of ranked text and/or speech documents. In this ...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملSoft indexing of speech content for search in spoken documents
The paper presents the Position Specific Posterior Lattice (PSPL), a novel lossy representation of automatic speech recognition lattices that naturally lends itself to efficient indexing and subsequent relevance ranking of spoken documents. This technique explicitly takes into consideration the content uncertainty by means of using soft-hits. Indexing position information allows one to approxim...
متن کاملSpeech Recognition and Information Retrieval: Experiments in Retrieving Spoken Documents
The Informedia Digital Video Library Project at Carnegie Mellon University is making large corpora of video and audio data available for full content retrieval by integrating natural language understanding, image processing, speech recognition and information retrieval. Information retrieval of from corpora of speech recognition output is critical to the project’s success. In this paper, we out...
متن کاملارائه یک روش جدید بازیابی اطلاعات مناسب برای متون حاصل از بازشناسی گفتار
In this article a pre-processing method is introduced which is applicable in speech recognized texts retrieval task. We have a text corpus, t generated from a speech recognition system and a query as inputs, to search queries in these documents and find relevant documents. A basic problem in a typical speech recognized text is some error percentage in recognition. This, results erroneously ass...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002